1. Configuring Kafka

	1.1 Running Kafka
	
		Login localhost:8080 ambari console, start kafka service
		
	1.2 Configuring Kafka brokers
	
		/etc/kafka/2.6.3.0-235/0/server.properties
		
	1.3 Configuring Kafka topics
	
		1.3.1 Create, list and describe the topic "class9Topic" with only one partition and one replication factor
		
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic class9Topic
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181 class9Topic
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic class9Topic
		
		1.3.2 Create and describe the topic "replicatedClass9Topic" with only one partition and one replication factor
		
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic replicatedClass9Topic
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic replicatedClass9Topic
	
	1.4 Creating a message console producer
	
		1.4.1 Create one message console producer and publish two messages to the topic "class9Topic"
			
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-producer.sh --broker-list sandbox-hdp.hortonworks.com:6667 --topic class9Topic
			
			The published messages:
				
				Class 9 Big Data Streaminn Processing
				Kafka is one of the distributed message systems
			
		1.4.2 Open new tab and login localhost:4200, create second message console producer, prepare the following messages in the file message.txt	
		
				Class 10 Spark ETL
				Class 6 ETL using NiFi
		
			Then run the following command:
			
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-producer.sh --broker-list sandbox-hdp.hortonworks.com:6667 --topic class9Topic < /root/TrainingOnHDP/dataset/message.txt
		
	1.5 Creating a message console consumer
	
		1.5.1 Create the consumer to fetch the messages from the topic "class9Topic"
		
			1.5.1.1 Fetch all messages:
			
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --topic class9Topic --bootstrap-server sandbox-hdp.hortonworks.com:6667 --from-beginning
		
				You should see the same message shown up in the consumer terminal
				
					Class 9 Big Data Streaminn Processing
					Kafka is one of the distributed message systems
					Class 10 Spark ETL
					Class 6 ETL using NiFi
				
			1.5.1.2 Fetch just one message:
			
				Stop the previous consumer (CTRL+C)
				
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --topic class9Topic --bootstrap-server sandbox-hdp.hortonworks.com:6667 --max-messages 1

				Launch the new terminal to run the following command:
				
				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-producer.sh --broker-list sandbox-hdp.hortonworks.com:6667 --topic class9Topic < /root/TrainingOnHDP/dataset/message.txt
				
				Go back to the consumer terminal, you should be able to see only one message there
				
			1.5.1.3 Fetch one message from an offset:
			
				Stop the previous consumer (CTRL+C)

				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --topic class9Topic --bootstrap-server sandbox-hdp.hortonworks.com:6667 --max-messages 1 --formatter 'kafka.coordinator.GroupMetadataManager$OffsetsMessageFormatter'

			1.5.1.4 fetch one message from a specific consumer group:
			
				Stop the previous consumer (CTRL+C)

				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --topic class9Topic --bootstrap-server sandbox-hdp.hortonworks.com:6667 --new-consumer --consumer-property group.id=consumerGroup

				Go to 1.4.1 terminal to publish the following message:
				
					Class 8 OLAP over Hadoop
				
				You should see the same message above shown up in the 1.5.1 consumer terminal
				
				Stop the 1.5.1.4 consumer (CTRL+C)

				Go to 1.4.1 terminal to publish the following message:

					Class 7 NoSQL over Hadoop
					Class 5 Big Data Scheduling
					
				Then go to the 1.5.1 terminal to run the consumer:

				/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --topic class9Topic --bootstrap-server sandbox-hdp.hortonworks.com:6667 --new-consumer --consumer-property group.id=consumerGroup
				
				You should be able to see the following messages, because last offset is kept for the same consumer group

					Class 7 NoSQL over Hadoop
					Class 5 Big Data Scheduling
				
				
	1.6 Configuring the broker settings
	
		/etc/kafka/2.6.3.0-235/0/server.properties	
		
			listeners=PLAINTEXT://sandbox-hdp.hortonworks.com:6667
			port=6667
			log.dirs=/kafka-logs
			advertised.listeners=
		
	1.7 Configuring threads and performance
	
		/etc/kafka/2.6.3.0-235/0/server.properties	
	
			message.max.bytes=1000000
			num.network.threads=3
			num.io.threads=8
			background.threads=10
			queued.max.requests=500
			socket.send.buffer.bytes=102400
			socket.receive.buffer.bytes=102400
			socket.request.max.bytes=104857600
			num.partitions=1

	1.8 Configuring the log settings
	
		/etc/kafka/2.6.3.0-235/0/server.properties	
			
			log.segment.bytes=1073741824
			log.roll.hours=168
			log.cleanup.policy=delete
			log.retention.hours=168
			log.retention.bytes=-1
			log.retention.check.interval.ms=30000
			log.cleaner.enable=false
			log.cleaner.threads=1
			log.cleaner.backoff.ms=15000
			log.index.size.max.bytes=10485760
			log.index.interval.bytes=4096
			log.flush.interval.messages=Long.MaxValue
			log.flush.interval.ms=Long.MaxValue
	
	1.9 Configuring the replica settings

		/etc/kafka/2.6.3.0-235/0/server.properties	

			default.replication.factor=1
			replica.lag.time.max.ms=10000
			replica.fetch.max.bytes=1048576
			replica.fetch.wait.max.ms=500
			num.replica.fetchers=1
			replica.high.watermark.checkpoint.interval.ms=5000
			fetch.purgatory.purge.interval.requests=1000
			producer.purgatory.purge.interval.requests=1000
			replica.socket.timeout.ms=30000
			replica.socket.receive.buffer.bytes=65536
		
	1.10 Configuring the Zookeeper settings

		/etc/kafka/2.6.3.0-235/0/server.properties	
	
			zookeeper.connect=sandbox-hdp.hortonworks.com:2181
			zookeeper.session.timeout.ms=6000
			zookeeper.connection.timeout.ms=6000
			zookeeper.sync.time.ms=2000
	
	1.11 Configuring other miscellaneous parameters

		/etc/kafka/2.6.3.0-235/0/server.properties	
	
			auto.create.topics.enable=true
			controlled.shutdown.enable=true
			controlled.shutdown.max.retries=3
			controlled.shutdown.retry.backoff.ms=5000
			auto.leader.rebalance.enable=true
			leader.imbalance.per.broker.percentage=10
			leader.imbalance.check.interval.seconds=300
			offset.metadata.max.bytes=4096
			max.connections.per.ip=Int.MaxValue
			connections.max.idle.ms=600000
			unclean.leader.election.enable=true
			offsets.topic.num.partitions=50
			offsets.topic.retention.minutes=1440
			offsets.retention.check.interval.ms=600000
			offsets.topic.replication.factor=3
			offsets.topic.segment.bytes=104857600
			offsets.load.buffer.size=5242880
			offsets.commit.required.acks=-1
			offsets.commit.timeout.ms=5000
	
2. Message Validation

	2.1 Create source and target topic
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic source-topic
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic good-topic
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bad-topic

	2.2 List all topics

		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181	
		
	2.3 Launch the new command-line terminal, start the producer console running the source-topic topic, where the input messages are typed

		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-producer.sh --broker-list sandbox-hdp.hortonworks.com:6667 --topic source-topic

	2.4 Launch the new command-line terminal, start the consumer script listening to good-topic
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --from-beginning --topic good-topic

	2.5 Launch the new command-line terminal, start the consumer script listening to bad-topic
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --from-beginning --topic bad-topic

	2.6 Launch the new command-line terminal, start up the processing application
	
		java -cp /root/TrainingOnHDP/StreamingApplicationOnKafka/target/StreamingApplicationOnKafka-1.0-SNAPSHOT-jar-with-dependencies.jar ca.training.bigdata.kafka.validation.ProcessingApp sandbox-hdp.hortonworks.com:6667 consumerGroup source-topic good-topic bad-topic
		
	2.7 Go to 2.3 commane-line terminal, then send the following three messages:

		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "86689427", "name":"Edward S.", "ipAddress": "95.31.18.119"}, "currency": {"name": "bitcoin","price": "USD"}, "timestamp": "2017-07-03T12:00:35Z"}
		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "18313440", "name":"Julian A.", "ipAddress": "185.86.151.11"}, "currency": {"name": "bitcoin","price": "USD"}, "timestamp": "2017-07-04T15:00:35Z"}
		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "56886468", "name":"Lindsay M.", "ipAddress": "186.46.129.15"}, "currency": {"name":"bitcoin", "price": "USD"}, "timestamp": "2017-07-11T19:00:35Z"}

		The messages will be shown up in the 2.4 command-line terminal
		
		send the following messages:
		
			This is the bad message
			
		The message will be shown up in the 2.5 command-line terminal	
		
		
		
3. Message Enrichment

	3.1 Download a free copy of the MaxMind GeoIP database, with this command:
	
		cd /root/TrainingOnHDP/dataset
		wget "http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz"
		gunzip GeoLiteCity.dat.gz

	3.2 Go to the Open Exchange Rates page at: https://openexchangerates.org/. Register for a free plan to obtain your free API key	
	
	3.3 Launch the new command-line terminal, start the producer console running the source-topic topic, where the input messages are typed

		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-producer.sh --broker-list sandbox-hdp.hortonworks.com:6667 --topic source-topic
		
	3.4 Launch the new command-line terminal, start the consumer script listening to good-topic
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --from-beginning --topic good-topic

	3.5 Launch the new command-line terminal, start the consumer script listening to bad-topic
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --from-beginning --topic bad-topic

	3.6 Launch the new command-line terminal, start up the processing application
	
		java -cp /root/TrainingOnHDP/StreamingApplicationOnKafka/target/StreamingApplicationOnKafka-1.0-SNAPSHOT-jar-with-dependencies.jar ca.training.bigdata.kafka.enrichment.ProcessingApp sandbox-hdp.hortonworks.com:6667 consumerGroup source-topic good-topic bad-topic
			
	3.7 Go to 2.3 commane-line terminal, then send the following three messages:

		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "86689427", "name":"Edward S.", "ipAddress": "95.31.18.119"}, "currency": {"name": "bitcoin","price": "USD"}, "timestamp": "2017-07-03T12:00:35Z"}
		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "18313440", "name":"Julian A.", "ipAddress": "185.86.151.11"}, "currency": {"name": "bitcoin","price": "USD"}, "timestamp": "2017-07-04T15:00:35Z"}
		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "56886468", "name":"Lindsay M.", "ipAddress": "186.46.129.15"}, "currency": {"name":"bitcoin", "price": "USD"}, "timestamp": "2017-07-11T19:00:35Z"}

		The messages will be shown up in the 3.4 command-line terminal

		
4. Managing Kafka

	4.1 Managing consumer groups
	
		4.1.1 List the consumer groups

			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-consumer-groups.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --list
		
			if old high-level consumers are used and the group metadata is stored in ZooKeeper (with the offsets.storage=zookeeper flag), specify zookeeper instead of bootstrap-server, as follows:

			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --list
			
		
		4.1.2 To see the offsets, use describe on the consumer group as follows:
			
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-consumer-groups.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --describe --group consumerGroup
		
	4.2 Dumping log segments
	
		The DumpLogSegments command parses the log file and dumps its contents to the console; it is useful for debugging a seemingly corrupt log segment.
	/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.ExportZkOffsets --group consumerGroup --out-file /tmp/zkoffset.txt --zkconnect localhost:2181
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/class9Topic-0/00000000000000000000.log
	
	4.3 Importing ZooKeeper offsets
	
		If have a backup of the offsets contained in ZooKeeper at some point in time, one can restore them. This tool is handy for restoring the status of the offsets to the point when they were taken.
	
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.ExportZkOffsets --group consumerGroup --output-file /tmp/zkoffset.txt --zkconnect localhost:2181
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.ImportZkOffsets --input-file /tmp/zkoffset.txt --zkconnect localhost:2181
	
	4.4 Using the GetOffsetShell
	
		The GetOffsetShell is an interactive shell to get the consumer offsets
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list sandbox-hdp.hortonworks.com:6667 --topic class9Topic --time -1
	
	4.5 Using the JMX tool
	
		The JMX tool dumps the JMX values to standard output (Need to add JMX_PORT=15500 to kafka config and restart kafka service)

			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://:15500/jmxrmi
		
	4.6 Using the MirrorMaker tool
	
		The MirrorMaker tool is useful when we need to replicate the same data in a different cluster. The MirrorMaker tool continuously copies data between two Kafka clusters
		
		This lab needs to set up different cluster, can be skipped if you don't have another cluster
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config config/consumer.config --producer.config config/producer.config --whitelist class9Topic		
	
	4.7 Replaying log producer
	
		The ReplayLogProducer tool is used to move data from one topic to another.
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.ReplayLogProducer --sync --broker-list sandbox-hdp.hortonworks.com:6667 --inputtopic cleass9Topic --outputtopic replicatedClass9Topic --zookeeper localhost:2181
	
		The data should be moved to the replicatedClass9Topic even if the ConsumerTimeoutException occurs sometimes
	
	4.8 Using state change log merger
	
		The StateChangeLogMerger tool merges the state change logs from different brokers for easy posterior analysis. It is a tool for merging the log files from several brokers to rebuild a unified history of what happened.

		This lab needs to set up different cluster, can be skipped if you don't have another cluster
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-run-class.sh kafka.tools.StateChangeLogMerger --log-regex /tmp/state-change.log* --partitions 0,1,2 --topic class9Topic		

5. Operating Kafka

	5.1 Adding or removing topics
	
		5.1.1 Create the topic 'test-topic'
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic test-topic --partitions 5 --replication-factor 1
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test-topic
		
		5.1.1 Delete the topic 'test-topic'
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic test-topic
			
			Note: delete.topic.enable (kafka configuration page on the ambari console) must be set to be true and restart kafka service
		
	5.2 Modifying message topics
	
		5.2.1 changes the delete.retention.ms to 10 seconds and deletes the configuration retention.ms

			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic test1-topic --partitions 1 --replication-factor 1
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic test1-topic --partitions 5 --config delete.retention.ms=10000 --delete-config retention.ms
	
			Note: Kafka does not support reducing the number of partitions for a topic
			
		5.2.2 To add a config to a topic, run the following

			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name test-topic --alter --add-config retention.ms=1000
			
		5.2.3 To remove a config from a topic, run the following

			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name test-topic --alter --delete-config retention.ms		
	
	5.3 Implementing a graceful shutdown
	
		5.3.1 add the following to kafka config, and restart kafka service
			
			controlled.shutdown.enable=true
	
	5.4 Balancing leadership
	
		5.4.1 run the following the command:
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-preferred-replica-election.sh --zookeeper localhost:2181
	
		5.4.2 or add the following to kafka config, and restart kafka service
	
			auto.leader.rebalance.enable = true
	
	5.5 Expanding clusters
	
		Partitions movement between different brokers, since we don't have kafka cluster, so this lab can be skipped 
		
		5.5.1 Prepare the following json file named as to_reass`ign.json
			
			{"topics": [{"topic": "topic_1"},
						{"topic": "topic_2"}],
				"version":1
			}

		5.5.2 Run the following command to generate the assignment
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file to_reassign.json --broker-list "7,8" --generate
	
		5.5.3 Save the output as custom-assignment.json, you can make the change on this file if needed
		
		5.5.4 Run the following command to execute the reassignment
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file custom-assignment.json --execute
		
		5.5.5 Run the same command to verify the partition assignment
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file custom-assignment.json --verify
	
	5.6 Increasing the replication factor
	
		since we don't have kafka cluster, so this lab can be skipped
	
		5.6.1 Create a JSON file named increase-replication.json with this code
		
			{"version":1,
				"partitions":[{"topic":"test-topic","partition":0,"replicas":[3,4,5,6]
			}]}
	
		5.6.2 Rh the following to execute the assignment
		
			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file increase-replication-factor.json --execute

	
	5.7 Decommissioning brokers
	
		5.7.1 Create a JSON file named change-replication.json with the following content:
			
			{"version":1,
			"partitions":[{"topic":"test-topic","partition":0,"replicas":[1,2]}]}
	
		5.7.2 Reassign the topic to the two living brokers with the reassign-partitions command:

			/usr/hdp/2.6.3.0-235/kafka/bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file change-replication.json --execute
	
	5.8 Checking the consumer position	
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-consumer-groups.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --describe --group consumerGroup
	
	5.9 Zookeeper Tips
		/usr/hdp/2.6.3.0-235/zookeeper/bin/zkCli.sh -server localhost:2181
		ls /
		get /consumers
		
	5.10 Config

		5.10.1 Show all configuration overrides for a topic
		
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-configs.sh --zookeeper localhost:2181 --describe --entity-type topics --entity-name class9Topic
		
	5.11 Performance

		5.11.1 Producer
		
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-producer-perf-test.sh --topic class9Topic --broker-list sandbox-hdp.hortonworks.com:6667 --messages 20000
		
	5.12 ACLs

		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:elastic --producer --topic class9Topic
		
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --list --topic class9Topic
		
		
	
6. Monitoring

	6.1 Monitoring server statistics
	
		6.1.1 Add JMX_PORT=15500 to Kafka config and restart kafka service
		
		6.1.2 Run the following from windows command line:
		
			jconsole localhost:15500
			
			choose Local Process (15500)
			
	6.2 Monitoring producer statistics
	
		6.2.1 Run the following command
		
			JMX_PORT=15501 /usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-producer.sh --broker-list sandbox-hdp.hortonworks.com:6667 --topic test_topic

		6.2.2 Run the following from windows command line:
		
			jconsole localhost:15501
			
			choose Local Process (15501)

	6.3 Monitoring consumer statistics
	
		6.3.1 Run the following command
		
			JMX_PORT=15502 /usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-consumer.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --from-beginning --topic test_topic

		6.3.2 Run the following from windows command line:
		
			jconsole localhost:15502
			
			choose Local Process (15502)

	
	6.4 Monitoring with the help of Graphite (Need to download and install Graphite)
	
		Add these lines to the server.properties file, and restart the kafka service

			kafka.metrics.reporters=com.criteo.kafka.kafkaGraphiteMetricsReporter
			kafka.graphite.metrics.reporter.enabled=true
			kafka.graphite.metrics.host=localhost
			kafka.graphite.metrics.port=8649
			kafka.graphite.metrics.group=kafka

	6.5 Monitoring with the help of Ganglia	(Need to download and install Ganglia)

		Add these lines to the server.properties file, and restart the kafka service
	
			kafka.metrics.reporters=com.criteo.kafka.kafkaGangliaMetricsReporter
			kafka.ganglia.metrics.reporter.enabled=true
			kafka.ganglia.metrics.host=localhost
			kafka.ganglia.metrics.port=8649
			kafka.ganglia.metrics.group=kafka
	
7. Creating Custom Partition for Producer	

	7.1 Create partition topic
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic partition-topic
	
	7.2 Launch the new command-line terminal to start the process

		java -cp /root/TrainingOnHDP/StreamingApplicationOnKafka/target/StreamingApplicationOnKafka-1.0-SNAPSHOT-jar-with-dependencies.jar ca.training.bigdata.kafka.partition.ProcessingApp

	
8. Creating Kafka Producer and Consumer Example

	8.1 Create demo topic
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic demoTopic
	
	8.2 Launch the new command-line terminal to start the consumer process

		java -cp /root/TrainingOnHDP/StreamingApplicationOnKafka/target/StreamingApplicationOnKafka-1.0-SNAPSHOT-jar-with-dependencies.jar ca.training.bigdata.kafka.demo.DemoConsumer
		
	8.3 Launch the new command-line terminal to start the producer process
	
		java -cp /root/TrainingOnHDP/StreamingApplicationOnKafka/target/StreamingApplicationOnKafka-1.0-SNAPSHOT-jar-with-dependencies.jar ca.training.bigdata.kafka.demo.DemoProducer
	

9. Building Streaming Process Applications using Kafka

	9.1 ElasticSeach Installation (Skip it if you have already installed)
	
		9.1.1 Installation URL
		
			https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.1.1.tar.gz
	
		9.1.2 Upload elasticsearch-6.1.1.tar.gz to /root/TrainingOnHDP/ on HDP sandbox

		9.1.3 Login localhost:4200 and unpack the file at HDP sandbox

			cd /root/TrainingOnHDP/
			tar xvzf elasticsearch-6.1.1.tar.gz
		
		9.1.4 Open elasticsearch.yml at /root/TrainingOnHDP/elasticsearch-6.1.1/config and make the following port change (was 9200)

			http.port: 9200
			
		9.1.5 Start Elastic Search

			useradd elastic
			passwd elastic
			su elastic
			
			/root/TrainingOnHDP/elasticsearch-6.1.1/bin/elasticsearch		
	
	9.2 Kibana Installation	(Skip it if you have already installed)
		
		9.2.1 Installation URL
		
			https://artifacts.elastic.co/downloads/kibana/kibana-6.1.1-linux-x86_64.tar.gz
	
		9.2.2 Upload kibana-6.1.1-linux-x86_64.tar.gz to /root/TrainingOnHDP/ on HDP sandbox

		9.2.3 Login localhost:4200 and unpack the file at HDP sandbox

			cd /root/TrainingOnHDP/
			tar xvzf kibana-6.1.1-linux-x86_64.tar.gz
			
		9.2.4 Open kibana.yml at /root/TrainingOnHDP/kibana-6.1.1-linux-x86_64 and make the following port change (was 5601)

			server.port: 8744
			server.host: "0.0.0.0"
			
		9.2.5 Start Kibana
		
			/root/TrainingOnHDP/kibana-6.1.1-linux-x86_64/bin/kibana
			
		9.2.6 Goto kibana console http://localhost:8744 to create index pattern kafka-streaming-index
		
	9.3 Login Ambari console to start Kafka and Hive services

	9.4 Launch the new command-line terminal, enter hive and create hive database 'kafka'

		create database kafka;
		
	9.5 Launch the new command-line terminal, start the producer console running the source-topic topic, where the input messages are typed

		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-console-producer.sh --broker-list sandbox-hdp.hortonworks.com:6667 --topic source-topic	
		
		Note: sometimes if something is wrong, need to delete and recreate the topics "good-topic" and "bad-topic"

	9.6 Launch the new command-line terminal, start up the processing application
	
		java -cp /root/TrainingOnHDP/StreamingApplicationOnKafka/target/StreamingApplicationOnKafka-1.0-SNAPSHOT-jar-with-dependencies.jar ca.training.bigdata.kafka.enrichment.ProcessingApp sandbox-hdp.hortonworks.com:6667 consumerGroup source-topic good-topic bad-topic
		
	9.7 Launch the new command-line terminal, choose one of the following options to start up the streaming process application
	
		Option 1

		spark-submit --class ca.training.bigdata.kafka.streaming.SparkStreamingApp --driver-memory 2G --executor-memory 2G --master local[1] /root/TrainingOnHDP/StreamingApplicationOnKafka/target/StreamingApplicationOnKafka-1.0-SNAPSHOT-jar-with-dependencies.jar
		
		Option 2
		
		spark-submit --class ca.training.bigdata.kafka.streaming.SparkStructuredStreamingApp --driver-memory 2G --executor-memory 2G --master local[1] /root/TrainingOnHDP/StreamingApplicationOnKafka/target/StreamingApplicationOnKafka-1.0-SNAPSHOT-jar-with-dependencies.jar
		
	9.8 Go to 9.5 commane-line terminal, then send the following three messages:

		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "86689427", "name":"Edward S.", "ipAddress": "95.31.18.119"}, "currency": {"name": "bitcoin","price": "USD"}, "timestamp": "2017-07-03T12:00:35Z"}
		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "18313440", "name":"Julian A.", "ipAddress": "185.86.151.11"}, "currency": {"name": "bitcoin","price": "USD"}, "timestamp": "2017-07-04T15:00:35Z"}
		{"event": "CUSTOMER_SEES_BTCPRICE", "customer": {"id": "56886468", "name":"Lindsay M.", "ipAddress": "186.46.129.15"}, "currency": {"name":"bitcoin", "price": "USD"}, "timestamp": "2017-07-11T19:00:35Z"}

		The messages will be shown up in the 3.4 command-line terminal
	
	9.9 Go to 9.4 command-line terminal, then run hive query if you choose the option 1 in 9.9
	
		select * from kafka.customer_bitcoin;
		
	9.10 Login localhost:8744 (Kibana console), Management->Index Pattern, choose kafka-streaming-index to add

		Next step, goto Discover, you should be able to choose the kafka-streaming-index to search
		
	9.11 Launch the new command-line terminal, if you choose the options in 9.9200

		hadoop fs -ls /user/root/kafka
		
		You should be able to see some new files there
		
	9.12 Nifi 1.2.0 Installation (Skip it if you have already installed)

		9.12.1 Installation Guide URL
		
			https://hortonworks.com/downloads
		
		9.12.2 Goto Ambari Console http://localhost:8080 to stop Nifi Service (old version)

		9.12.3 Upload nifi-1.2.0.3.0.2.0-76-bin.tar.gz to /root/TrainingOnHDP/ on HDP sandbox

		9.12.4 Login localhost:4200 and unpack the file at HDP sandbox

			cd /root/TrainingOnHDP/
			tar xvzf nifi-1.2.0.3.0.2.0-76-bin.tar.gz
		
		9.12.5 Open nifi.properties at /root/TrainingOnHDP/nifi-1.2.0.3.0.2.0-76/conf and make the following port change (was 8080)

			nifi.web.http.port=9090	
	
		9.12.6 Run Nifi
	
			/root/TrainingOnHDP/nifi-1.2.0.3.0.2.0-76/bin/nifi.sh start
		
		9.12.7 Browse to http://localhost:9090/nifi
		
	9.13 Import NiFi template "Building_Streaming_Process_Application.xml" from /root/TrainingOnHDP/StreamingApplicationOnKafka/nifi and run it

	9.14 Go to 9.5 command-line terminal, then send the following the message:

		This is the test message
		
	9.15 Go to 9.11	command-line terminal
	
		hadoop fs -ls /user/root/kafka/error
		
		You should be able to see some new files there
		
	9.16 Login localhost:8744 (Kibana console), Management->Index Pattern, choose kafka-streaming-error-index to add

		Next step, goto Discover, you should be able to choose the kafka-streaming-error-index to search












	